-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Recover interrupted dlc protocol on restart #1817
Conversation
d8694b6
to
5b35de5
Compare
5b35de5
to
eca91d1
Compare
e2d1549
to
ce27de9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice - good work. I think some code duplication can be avoided but otherwise this looks good to me.
coordinator/migrations/2024-01-10-105231_add_dlc_messages/up.sql
Outdated
Show resolved
Hide resolved
Extending the rust-dlc storage api to also be able to get non-signed channels.
Publishes an event if a peer connected successfully using the onion message handler hook.
30da175
to
876bdab
Compare
@bonomat I've addressed all your review remarks and pushed them as fixups on top of my changes. Please have another look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
876bdab
to
a724da4
Compare
Upon connecting to the peer, the last send dlc message is looked up from the database (`last_outbvound_dlc_messages`) and is naively resend to the counterparty (this is applicable for both sides). Once the counterparty receives the last message, it will check if it has already processed that message looking up the hash of the dlc message in the `dlc_messages` table. If it has alrady seen the message it will simply ignore it if not, it will continue processing it as it appears it must have missed it the first time, thus continuing an interrupted protocol. - All dlc messages are stored into the `dlc_messages` table once successfully processed - The last outbound dlc message is stored to the `last_outbound_dlc_messages` table. Note, that since a dlc message can get quite large we are only keeping the last message around so to not waste space. In order to keep track of all outbound dlc messages I had to introduce the `SendDlcMessage` Event, which re-routes the sending over the app / coordinator layer who is then able to store the message firstly into the `dlc_messages` and `last_outbound_dlc_messages` before actually sending the message to the counter party. This way we ensure we are always able to resend the last message. Note, since we are using different data sources with independet transactional systems it can happen that we are persting things into the sled database (dlc), but not into the sqlite database (10101). This should get addressed eventually. e.g. by also storing the dlc data structures into the same sqlite database.
After e268c62, the order can't be copied anymore.
The node was still using its own implementation for sending dlc message, this was primarily needed because it also had to support `SubChannelMessages`. This change removes all subchannel and onchain protocol handlings from processing incoming dlc messages, as those protocols aren't used. By switching to the common `NodeEvent::SendDlcMessage` we can rely on the outbound message being persisted during sending and can get rid of the `DlcMessageStep`.
Instead of processing the position state on the outbound dlc message. IMHO the logic is easier to understand that way and the code is more cohesive that way. (at least all the code that needs to be refactored is now in one place :)
a724da4
to
14a2169
Compare
By default this is set to 2MB, according to the docs[1]. We need larger bodies than before (even in the e2e test!), because we added lots of CET adaptor signatures to the database recently[2]. I also played around with nginx's `client_max_body_size`, but that's actually irrelevant since we don't use nginx as a reverse proxy for the coordinator in the e2e tests. Related to #1719. [1]: https://docs.rs/axum/latest/axum/extract/struct.DefaultBodyLimit.html#method.disable. [2]: #1817.
This change introduces a naive mechanism to recover from an interrupted dlc protocol (e.g. lost connection or a restart) by persisting the hash of any received message and resending the last outbound dlc message on any reconnect.
last_outbound_dlc_messages
) and is naively resend to the counterparty (this is applicable for both sides).dlc_messages
table.dlc_messages
table once successfully processedlast_outbound_dlc_messages
table. Note, that since a dlc message can get quite large we are only keeping the last message around so to not waste space.In order to keep track of all outbound dlc messages I had to introduce the
SendDlcMessage
Event, which re-routes the sending over the app / coordinator layer who is then able to store the message firstly into thedlc_messages
andlast_outbound_dlc_messages
before actually sending the message to the counter party. This way we ensure we are always able to resend the last message.TODO (Not necessarily everything in this PR)
Offer
, but did not process the accept yet. The protocol will still get stuck. feat: Recover from pending offer #1822sync_position_to_subchannel
should be removed, adapted for a dlc channel.Future work / Notes:
OnionMessageHandler
(TenTenOneOnionMessageHandler
) to get an reliable hook on when a peer connected. This seems to be far more reliable than to rely on the websocket connection and should be considered for triggering our rollover and expired position actions.